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Abstract. A number of representation schemes have been presented for 
use within Learning Classifier Systems, ranging from binary encodings to 
neural networks. This paper presents results from an investigation into 
using a discrete dynamical system representation within the XCS Learn- 
ing Classifier System. In particular, asynchronous random Boolean net- 
works are used to represent the traditional condition- act ion production 
system rules. It is shown possible to use self-adaptive, open-ended evo- 
lution to design an ensemble of such discrete dynamical systems within 
XCS to solve a number of well-known test problems. 



1 Introduction 

Traditionally, Learning Classifier Systems (LCS) T7| use a ternary encoding to 
generalize over the environmental inputs and to associate appropriate actions. 
A number of representations have previously been presented beyond this scheme 
however, including real numbers WV , LISP S-expressions |23], fuzzy logic [37 and 
neural networks [5 . To date, no temporally dynamic representation schemes have 
been used in LCS, a potentially important approach since temporal behaviour 
of such kinds is viewed as a significant aspect of cognition in general. 

In this paper we explore the use of a dynamical system representation within 
XCS [40] — what is herein termed dynamical genetic programming (DGP). Tra- 
ditional tree-based genetic programming (CP) \TD has been used within LCS 
both to calculate the action 1 and to represent the condition ^24:. DGP uses a 
graph-based representation, each node of which is constantly updated with asyn- 
chronous parallelism, and evolved using an open-ended, self-adaptive scheme. In 
the discrete case, each node is a Boolean function and therefore equivalent to a 
form of random Boolean network (RBN) (e.g., [10] )■ We show that XCS is able 
to solve a number of well-known immediate and delayed reward tasks using this 
temporally dynamic knowledge representation scheme. 



2 Related Work 



A number of representations have been presented by which to enable the evolu- 
tion of computer programs, the most common being tree-based LISP S-expressions 
[24] . Other forms of GP include the use of machine code instructions (e.g., [4]) 
and finite state machines (e.g., [13]). Most relevant to the form of GP used in 
this paper is the small amount of prior work on graph-based representations. 
Teller and Veloso's [30] neural programming uses a directed graph of connected 
nodes, each with functionality defined in the standard GP way, with recursive 
connections included. Significantly, each node is executed with synchronous par- 
allelism for some number of cycles before an output nodes value is taken. Poll 
(e.g., |31j ) presented a very similar scheme wherein the graph is placed over a 
two-dimensional grid and executes its nodes synchronously in parallel. Other 
examples of graph-based GP typically contain sequentially updating nodes (e.g., 
[27] )■ Schmidt and Lipson [3^ have recently demonstrated a number of benefits 
from graph encodings over traditional trees, such as reduced bloat and increased 
computational efficiency. 

As noted above, tree-based S-expressions have been used within LCS. Re- 
cently, Wilson [42] has explored the use of a form of Gene Expression Program- 
ming (GEP) [12] within LCS. Here the rules are represented as expression trees 
which are evaluated by assigning the environmental inputs to the trees termi- 
nals, evaluating the tree, and then comparing the result with a predetermined 
threshold. Whenever the threshold value is exceeded, the rule is added to the 
match set. 

The most common form of discrete dynamical system is the Cellular Au- 
tomaton (CA) [5S] which consists of an array of cells (lattice of nodes) where 
the cells exist in states from a finite set and update their states with synchronous 
parallelism in discrete time. Traditionally, each cell calculates its next state de- 
pending upon its current state and the states of its closest neighbours. That is, 
CAs may be seen as a graph with a (typically) restricted topology. Packard [30] 
was the first to use evolutionary computing techniques to design CAs such that 
they exhibit a given emergent global behaviour. Following Packard, Mitchell et 
al. (e.g., [12) have investigated the use of a Genetic Algorithm (GA) [16] to learn 
the rules of uniform binary CAs. As in Packards work, the GA produces the en- 
tries in the update table used by each cell, candidate solutions being evaluated 
with regard to their degree of success for the given task. Andre et al. [5] repeated 
Mitchell et al.s work whilst using traditional GP to evolve the update rules. They 
report similar results. Sipper (e.g., [33]) presented a non- uniform, or heteroge- 
neous, approach to evolving CAs. Each cell of a one- or two-dimensional CA is 
also viewed as a GA population member, mating only with its lattice neighbours 
and receiving an individual fitness. He shows an increase in performance over 
Mitchell et al.s work by exploiting the potential for spatial heterogeneity in the 
tasks. Sipper and Ruppin [33] extended this approach to enable heterogeneity 
in the node connectivity, along with the node function; they evolved a form of 
random Boolean networks. 



3 Random Boolean Networks 



The discrete dynamical systems known as Random Boolean Networks (RBN) 
were originally introduced by Kauffman (see [20 ) to explore aspects of biolog- 
ical genetic regulatory networks. Since then they have been used as a tool in a 
wide range of areas, such as self-organisation (e.g., [5^) and computation (e.g., 
An RBN typically consists of a network of N nodes, each performing a 
Boolean function with K inputs from other nodes in the network, all updating 
synchronously (see Figure [Ij. As such, RBN may be viewed as a generalisation 
of binary Cellular Automata (CA) ^38 and unorganized machines PS^. Since 
they have a finite number of possible states (2^) and they use deterministic 
Boolean functions, the dynamics of RBN eventually fall into a basin of attrac- 
tion. It is well-established that the value of K affects the emergent behaviour of 
RBN wherein attractors typically contain an increasing number of states with 
increasing K . Three phases of behaviour are suggested: ordered when K — 1, 
with attractors consisting of one or a few states; chaotic when K > 3, with a 
very large number of states per attractor; and, a critical regime around K = 2, 
where similar states lie on trajectories that tend to neither diverge nor converge 
and 5-15% of nodes change state per attractor cycle (see f20| for discussions of 
this critical regime, e.g., with respect to perturbations). Analytical methods have 
been presented by which to determine the typical time taken to reach a basin 
of attraction and the number of states within such basins for a given degree of 
connectivity K (see |,20j). 
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Fig. 1: Example Random Boolean Network and node encoding. 



Closely akin to the work described here, Kauffman 20! describes the use 
of simulated evolution to design RBN which must play a (mis)niatching game 
wherein mutation is used to change connectivity, the Boolean functions, K and 
N . He reports the typical emergence of high fitness solutions with K—2 to 3, 
together with an increase in N over the initialised size. 

As noted above, traditional RBN consist of N nodes updating synchronously 
in discrete time steps, but asynchronous versions have also been presented, af- 
ter [15], leading to a classification of the space of possible forms of RBN [M]. 
Asynchronous forms of CA have also been explored (e.g., [19]) wherein it is often 
suggested that asynchrony is a more realistic underlying assumption for many 
natural and artificial systems. 

Asynchronous logic devices are known to have the potential to consume less 
power and dissipate less heat which may be exploitable during efforts to- 
wards hardware implementations of such systems. Asynchronous logic is also 
known to have the potential for improved fault tolerance, particularly through 
delay insensitive schemes (e.g., 0). This may also prove beneficial for hardware 
implementations. 

Harvey and Bossomaier [15] showed that asynchronous RBN exhibit either 
point attractors, as seen in asynchronous CAs, or "loose" attractors where "the 
network passes indefinitely through a subset of its possible states" [ibid.] (as 
opposed to distinct cycles in the synchronous case) . Thus the use of asynchrony 
represents another feature of RBN with the potential to significantly alter their 
underlying dynamics thereby offering another mechanism by which to aid the 
simulated evolutionary design process for a given task. Di Paolo [10] showed it 
is possible to evolve asynchronous RBN which exhibit rhythmic behaviour at 
equilibrium. Asynchronous CAs have also been evolved (e.g., [34]). 

4 Discrete DGP-XCS 

To use asynchronous RBN as the rules within XCS, the following scheme is 
adopted. Each of an initial randomly created rule's nodes has K randomly as- 
signed connections, here 1 < < 5. There are as many nodes N as input fields 
/ for the given task and its outputs O, plus one other, as will be described, i.e., 
N = 7-I-0-I- 1. The first connection of each input node is set to the corresponding 
locus of the input message. The other connections are assigned at random within 
the RBN as usual. In this way, the current input state is always considered along 
with the current state of the RBN itself per network update cycle by such nodes 
(see Figured]). Nodes are initialised randomly each time the network is run to 
determine [M], etc. The population is initially empty and covering is applied to 
generate rules as in the standard XCS approach. 

Matching consists of executing each rule for T cycles based on the current 
input. The value of T is chosen to be a value typically within the basin of 
attraction of the RBN. Asynchrony is here implemented as a randomly chosen 
node being updated on a given cycle, with as many updates per overall network 
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Fig. 2: An evolved dDGP-XCS 6-bit MUX asynchronous rule. 



update cycle as there are nodes in the network before an equivalent cycle to one 
in the synchronous case is said to have occurred. See |14] for alternative schemes. 

In this study, when well-known Boolean problems are explored there are 
only two possible actions and thus only one output node is required. Where 
well-known maze problems are explored there are eight possible actions and ac- 
cordingly three required output nodes. An extra "matching" node is also required 
to enable RBNs to (potentially) only match specific sets of inputs. If a given REN 
has a logical '0' on the match node, regardless of its output node's state, the 
rule does not join [M] (see Figured]). This scheme has also been exploited within 
neural LCS [5]. A 'windowed approach' is utilised where the output is decided 
by the most common state over the last W steps up to T. For example, if the 
last few states on a node updating prior to cycle T is 0101001 and W — 3, then 
the ending node's state would be '0' and not '1'. In this paper, W is set to 3. 
Thereafter, match set and action set processing proceeds as standard in XCS 
(the reader is referred to [H] for an algorithmic description of XCS). 

When covering is necessitated, a randomly constructed RBN is created and 
then executed for T cycles to determine the status of the match and output 
nodes. This procedure is repeated until an RBN is created that matches the 
environment state. 

Parameter self-adaptation was first explored in LCS by Bull et al. [5] wherein 
the mutation rate is a locally evolving entity in itself; each rule has its own 
mutation rate /i Mutation only is used here and applied to the node's truth table 
and connectivity map at rate fi. A node's truth table is represented by a binary 
string and its connectivity by a list of K integers in the range [1, iV]. Since each 
node has a given fixed K value, each node maintains a binary string of length 
2^ which forms the entries in the look-up table for each of the possible 2^ input 
states of that node, i.e., as in the aforementioned work of [SOl on evolving CAs, 
for example. These strings are subjected to mutation on reproduction at the self- 
adapting rate /x for that rule. Hence, within the RBN representation, evolution 
can define different Boolean functions for each node within a given network rule, 
along with its connectivity map. Specifically, each rule has its own mutation 
rate stored as a real number and initially seeded uniform randomly in the range 
[0.0, 1.0]. This parameter is passed to its offspring. The offspring then applies its 
mutation rate to itself using a Gaussian distribution, i.e., = iie^^^'^\ before 
mutating the rest of the rule at the resulting rate. 

Due to the need for a possible different number of nodes within the rules for a 
given task, the DGP scheme is also of variable length. Once the truth table and 
connections have been mutated, a new randomly connected node is either added 
or the last added node is removed with the same probability /i. The latter case 
only occurs if the network currently consists of more than the initial number 
of nodes. Thus DGP is temporally dynamic both in the search process and the 
representation scheme. Evolving variable-length solutions via mutation only has 
previously been explored a number of times, e.g., [13]. Traditional GP can be 
seen to primarily rely upon recombination to search the space of possible tree 



sizes, although the standard mutation operator effectively increases or decreases 
tree size also. 

Whenever an offspring classifier is created and no changes occur to its RBN 
when undergoing mutation, the parents numerosity is increased and mutation 
rate set to the offspring's. 

5 Experimentation 
5.1 Multiplexer 

We now apply this discrete version of DGP-XCS (dDGP-XCS) to the well-known 
multiplexer task. These Boolean functions are defined for binary strings of length 
I — X + 2^ under which the x bits index into the remaining 2^ bits, returning 
the value of the indexed bit. The correct classification to a randomly generated 
input results in a payoff of 1000, otherwise 0. 

Figures [3al - l3bl show the performance of the constructed system on the 6-bit 
multiplexer problem updated asynchronously with P = 800, v = Oga = 25, 
(3 = 0.2, pexpi — 1-0, T = 25, = 3, and Nina = 8 (6 inputs, 1 output, 1 match 
node). After [JO], performance from exploit trials only is recorded (fraction of 
correct responses are shown), using a 50-point running average, averaged over 
ten runs. 

From Figure [3a| it can be seen that a near optimal solution is learnt around 
35,000 trials and optimality is observed around trial 58,000. The parameter gov- 
erning RBN mutation (see Figure I3ap declines rapidly until reaching a bottom 
around 40,000 trials, which is shortly after discovering an optimal solution. The 
number of (non-unique) rules initially grows rapidly, before declining to around 
650. Furthermore, the average degree of connectivity K decreases fractionally, 
whilst, on average, each network grows approximately one extra node (see Fig- 
ure [3bl This behaviour indicates that the evolutionary process is able to identify 
an appropriate typical topology with which to generate complex behaviour, i.e., 
in this case a computation. For other tasks, other values of K may prove ben- 
eficial; high K may be expected in random number generation, for example. It 
can be noted that a growth event under which a new node is added into an RBN 
is essentially neutral here since the new node receives inputs from the existing 
nodes (or itself) on addition but only provides inputs to other nodes after sub- 
sequent connectivity mutations. For comparative purposes. Figure |4] shows the 
performance with the same parameters on the 6-bit multiplexer when updated 
synchronously. It is shown that the performance is very similar regardless of the 
updating scheme and that there is thus apparently very little overhead when 
updating asynchronously, with the possible benefits mentioned above. 

Figure [2] provides an illustration of a rule generated whilst solving the 6-bit 
multiplexer problem when updated asynchronously. There is one new node in 
addition to the initial eight. The truth table shows to which state each node 
will transition, given each of the possible inputs. For example, the output node 
(node 1) has a truth table of '10' which is synonymous with a NOT gate where 




(a) Performance (circle), error (square), (b) Average number of nodes (cir- 
macro-classifiers (triangle) and muta- cle) and average number of connections 
tion rate (diamond). (square). 

Fig. 3: dDGP-XCS 6-bit Multiplexer Performance 
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Fig. 4: dDGP-XCS 6-bit Multiplexer synchronous performance (circle), error 
(square), macro-classifiers (triangle) and mutation rate (diamond). 



if node 3 is in state '0' then the output node will be set to '1', and if node 3 is 
in state '1' then the output node will be set to '0'. The truth table of node 3 is 
synonymous with an AND gate, etc. 

The rule has a Prediction of 1000.00 and an Error of 0.0, whilst having 
an Experience of 822, showing that this is a highly accurate rule. Analysis of 
this RBN rule was undertaken by executing it for each of the sixty-four 6-bit 
inputs. Each input was run twenty times with T = 25 and W = 3. The results 
show that for the majority of environment states the network will return a false 
match node, preventing it from being added to [M]. However, the network is 
general as the match node will always return true when the environment states 
are 110000, 110010, 110100, 110110, 111000, 111010, 111100, and 111110. In all 
of those cases the output node always advocates action '0'. In addition, there 
are several environment states for which the match node will only sometimes 



return true. However, in all cases when the match node does permit the rule 
to be added to [M], the action advocated will always be consistent. There are 
four such additional environment states (010000, 010010, 011000, and 011010) 
for which the rule will match, albeit with a probability less than 50%. 

The rule in Figure [2] was then re-run as before, however using a traditional 
synchronous updating scheme. The results of the match node and output nodes 
are extremely similar regardless of the updating mechanism. That is, XCS has 
evolved an RBN which is very robust to the random nature of the asynchronous 
updating, meaning it is accurate even for the relatively rare case of all nodes 
updating concurrently, i.e., the synchronous case. 



5.2 Maze Environments 

In addition to the single-step multiplexer problems, dDGP-XCS is applied to 
versions of three well-known multi-step maze environments, Woodsl (see Fig- 
ure [Sa|), Maze4 (see Figure [5b| . and WoodslOl (see Figure [5c| . 

Each cell in the maze environments is encoded with two binary bits, where 
white space is represented as a '*', obstacles as 'O', and food as 'F'. Furthermore, 
actions are encoded in binary as shown in Figure I5dl The task is simply to find 
the shortest path to the food (F) given a random start point. Obstacles (O) 
represent cells which cannot be occupied. A teletransportation mechanism is 
employed whereby a trial is reset if the agent has not reached the goal state 
within 50 discrete movements. In Woodsl the optimal number of steps to the 
food is 1.7, in Maze4 optimal is 3.5 steps, and in WoodslOl it is 2.9. 
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(b) Maze4 
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Fig. 5: Experimental Maze Environments and Encoding 



Figures [SaHScl show the performance of dDGP-XCS m the Woodsl envh'on- 
ment. The parameters used are identical to those appHed in the aforementioned 
multiplexer experiments, except that Ninu — 20 (16 inputs, 3 outputs, 1 match 
node) {P — 800). As can be seen from Figure [Sal optimality is observed around 
2,500 trials. This roughly matches the performance of neural XCS using self- 
adaptive constructivism («2,500 trials, P — 2000) and faster than XCS 
using messy conditions (?a8,000 trials, P = 800) ,22^, XCS using Stack-Based 
CP conditions («10,000 trials, P = 1000) [33], and XCS with LISP S-expression 
conditions (w5,000 trials, P = 800) [H]. Figure [6bl shows that there IS an av- 
erage of 745 (non-unique) rules evolved. In addition. Figure |6b] shows that the 
mutation rate declines rapidly by 2,800 trials, shortly after the optimal solution 
is learnt. Figure [Sc] shows that on average the networks add one extra node (from 
the original 20) and the average number of connections decreases slightly. 
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Fig. 6: dDGP-XCS Woodsl Performance 



Figures [TaHTcl present the performance of dDGP-XCS in the Maze4 environ- 
ment. The parameters used are identical to those in the Woodsl environment, 
however a bigger population limit of P = 2000 is used, reflecting the larger search 
space. Optimality is observed around trial 23,000 (see Figure [7a|) . which is again 
similar to the performance observed using a neural XCS with self-adaptive con- 
structivism (w23,000 trials, P — 3000) fTB]. The average number of rules evolved 
is around 1,800 (see Figure [Tbl) . The average number of nodes in the networks 
also increases by almost one, and the average number of connections declines 
slightly from 3 (see Figure [7c)) . The parameter governing RBN mutation (Fig- 
ure [7b]) declines rapidly after 4,000 trials, before finally stabilising after 15,000 
trials. 
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Fig. 7: dDGP-XCS Maze4 Performance 



The WoodslOl maze is a non-Markov environment containing two communi- 
cating aliasing states, i.e., two positions which border on the same non-aliasing 
state and are identicahy sensed, but require different optimal actions. Thus, to 
solve this maze optimally, a form of memory must be utilised (with at least two 
internal states). Optimal performance has previously been achieved in WoodslOl 
through the addition of a memory register mechanism in XCS [2 5) , a Corporate 
XCS using rule-linkage |35) . a neural LCS using recurrent links [7], and by a 
form of ACS with explicit memory |44] . Furthermore, in a proof of concept ex- 
periment, the cyclical directed graph from neural programming has been shown 
capable of representing rules with memory to solve WoodslOl, however it was 
only found to do so twice in fifty experiments [3]. 

The simplest form of short-term memory is a fixed-length buffer contain- 
ing the n most recent inputs; a common extension is to then apply a kernel 
function to the buffer to enable non-uniform sampling of the past values, e.g. 
an exponential decay of older inputs [55] • Simple forms of memory are static, 
i.e., the memory parameters are fixed in advance and the memory state is thus 
a predetermined function of the input sequence. However, it is not clear that 
biological systems make use of such shift registers. Registers require some inter- 
face with the environment which buffers the input so that it can be presented 
simultaneously. They impose a rigid limit on the duration of patterns, defining 
the longest possible pattern and requiring that all input vectors be of the same 
length. Furthermore, such approaches struggle to distinguish relative temporal 
position from absolute temporal position [11 . 

The hypothesis of inherent content-addressable memory existing within syn- 
chronous RBN due to different possible routes to a basin of attraction for 
the asynchronous case is here explored and extended by simply not resetting 
the node states on each step. A significant advantage of this approach is that 
each rule/network's short-term memory is variable-length and adaptive, i.e., the 
networks can adjust the memory parameters, selecting within the limits of the 
capacity of the memory, what aspects of the input sequence are available for 
computing predictions [35]. In addition, as open-ended evolution is used, the 
maximum size of the short-term memory is potentially also open-ended, increas- 
ing as the number of nodes within the network grows. 

Here, nodes are initialised at random for the initial random placing in the 
maze but thereafter they are not reset for each subsequent matching cycle. Con- 
sequently, each network processes the environmental input and the final node 
states then become the starting point for the next processing cycle, whereupon 
the network receives the new environmental input and places the network on a 
trajectory toward a (potentially) different locally stable limit point. Therefore, 
a network given the same environmental input (i.e., the agent's current maze 
perception) but with different initial node states (representing the agent's his- 
tory through the maze) may fall into a different basin of attraction (advocating 
a different action). Thus the rules' dynamics are (potentially) constantly affected 
by the inputs as the system executes. 



Figures [5aH5cl show the performance in the WoodslOl environment where 
all parameters used are identical to those applied in the previous Maze4 envi- 
ronment. As can be seen from Figure [5al dDGP-XCS, without node resets, is 
able to achieve optimal performance in WoodslOl after approximately 12,000 
trials (this is slower than XCS using an explicit 1-bit memory register («7,000 
trials, P = 800) Figure [8b] shows the mutation rate and macro-classifiers. 
Figure |5c] shows the average number of nodes and connections. Optimal per- 
formance is unattainable however when the nodes are reset randomly between 
matching (Figure [8d]) , proving that the system is exploiting the potential for 
memory within asynchronous RBN here. The mechanism works within XCS be- 
cause rules/RBN experience each input but need not match on each cycle. Hence 
for the ambiguous states they remain accurate for the payoff received on provid- 
ing the action but do so having processed the previous input in an appropriate 
way, potentially without matching. 
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Fig. 8: dDGP-XCS WoodslOl Performance 



6 Conclusions 



In this paper a form of XCS has been presented with which to design asyn- 
chronous random Boolean networks. It has been shown that XCS is able to 
design ensembles of RBN that collectively solve a computational task under a 
reinforcement learning scheme. In particular, it has been shown possible to ex- 
ploit the inherent dynamics of the representation scheme to solve a non-Markov 
maze, i.e., without extra mechanisms. Current research is exploring the possi- 
bilities of DGP as a general representation scheme by which to solve complex 
problems with LCS. 
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